Skip to content

Conversation

@rubenruizdegauna
Copy link
Member

This pull request adds a new validation step to the Buildkite pipeline to ensure that the agentless Docker image is built for all required architectures before running service tests. The main change is the introduction of a script that checks the presence of both amd64 and arm64 architectures in the Docker image manifest for the specified environment.

Pipeline enhancements:

  • Added a new pipeline step labeled ":docker: Validate docker image is built for all architectures" in .buildkite/pipeline.agentless-tests.yaml to run the validation script before service tests.

Docker image validation:

  • Introduced .buildkite/scripts/steps/validate-agentless-docker-image.sh, a script that:
    • Extracts the relevant image SHA from the environment configuration.
    • Inspects the Docker image manifest to determine available architectures.
    • Validates that both amd64 and arm64 architectures are present, failing the build if any are missing.

@mergify
Copy link
Contributor

mergify bot commented Dec 1, 2025

This pull request does not have a backport label. Could you fix it @rubenruizdegauna? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

v1v
v1v previously approved these changes Dec 1, 2025
Copy link
Member

@v1v v1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments with using container agents for faster builds and CODEOWNERS to help with notifying the corresponding team, instead the robots team

- label: ":docker: Validate docker image is built for all architectures"
command: ".buildkite/scripts/steps/validate-agentless-docker-image.sh"
agents:
provider: "gcp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

IIUC, we use this provider since it requires using Docker in it. Wonder whether we could use a container agent here with docker on it? like https://hub.docker.com/_/docker

I have not tested btw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I wasn't sure which one to use.

My first approach was to try docker.elastic.co/ci-agent-images/quality-gate-seedling and docker.elastic.co/ci-agent-images/platform-ingest/buildkite-agent-beats-ci-with-hooks:0.5 but none of the have docker.

Then reading the docs I found this

https://docs.elastic.dev/ci/getting-started-with-buildkite-at-elastic#run-docker-in-a-build

Since container management is not supported in the Kubernetes offering, in order to use docker-like commands, you need to use a GCP virtual machine agent.

So I understood that this was the correct way.

Is it ok to use image that are not from docker.elastic.co?

If that's the case, I can give it a try to https://hub.docker.com/_/docker

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer if we used something from docker.elastic.co (reasons also include failure domains etc). I have yet to find any image of ours that have docker-in-docker in it.

In this case though, we don't technically need docker, but any of the manifest tooling could achieve this, e.g using oras (going from bash/oras/yq to pseudo-code so bear with me):

$ image_archs=$(oras manifest fetch docker.elastic.co/observability-ci/ecp-elastic-agent-service:git-b1d4e25c181a | yq '[.manifests[].platform.architecture] | sort | join(", ")')
$ echo $image_archs
amd64, arm64
# assert image_archs == amd64, arm64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to use custom image docker.elastic.co/ci-agent-images/observability/oci-image-tools-agent

(created in https://github.com/elastic/ci-agent-images/pull/2052)

@@ -0,0 +1,111 @@
#!/usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add an entry in the CODEOWNERS with the team responsible for this script and probably .buildkite/pipeline.agentless-tests.yaml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added in 36c8137

setup_extract_sha() {
# Ensure repo is available - redirect output to /dev/null
if [ ! -d "serverless-gitops" ]; then
git clone --depth 1 [email protected]:elastic/serverless-gitops.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you confirm whether this is accessible in BK? I cannot find the relevant GH check for testing this PR Maybe I'm blind. Do you mind pointing where I can see it? 🙇

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was inspired by https://github.com/elastic/elastic-agent/blob/main/.buildkite/scripts/steps/run-agentless-tests.sh#L26 which is triggered from the same bk job, so permissions should be correct.

for testing this PR
TBH I haven't tested this in the pipeline. Just locally assuming that the permissions will work as the same checkout is done in the agentless-tests.

But I can create a branch instead of a fork and trigger a test :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok, I assume both scripts are triggered in the same bk pipeline.

I somehow thought it runs automatically for forked PRs too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a good idea in any case, as I found a couple of issues (yq not present and missing wait between steps), so thanks for the ping ;-)

@rubenruizdegauna rubenruizdegauna requested a review from v1v December 1, 2025 14:55
ycombinator
ycombinator previously approved these changes Dec 1, 2025
Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock PR from merging (needs approval from @elastic/elastic-agent-control-plane until the CODEOWNERS changes in this PR take effect) but will defer to someone from @elastic/ingest-managed-jobs for a second set of eyes.

swiatekm
swiatekm previously approved these changes Dec 2, 2025
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm not overly enthused about this being a bash script, but tying it into agent build tooling would probably be worse in the long run.

setup_extract_sha

# Extract the SHA for the specified environment
COMMIT_SHA=$(extract_sha "$ENVIRONMENT")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can replace everything up to this point with ${SERVICE_VERSION}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I checked the BK env vars and SERVICE_VERSION is there so I removed the code that was retrieving it.

- label: ":docker: Validate docker image is built for all architectures"
command: ".buildkite/scripts/steps/validate-agentless-docker-image.sh"
agents:
provider: "gcp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer if we used something from docker.elastic.co (reasons also include failure domains etc). I have yet to find any image of ours that have docker-in-docker in it.

In this case though, we don't technically need docker, but any of the manifest tooling could achieve this, e.g using oras (going from bash/oras/yq to pseudo-code so bear with me):

$ image_archs=$(oras manifest fetch docker.elastic.co/observability-ci/ecp-elastic-agent-service:git-b1d4e25c181a | yq '[.manifests[].platform.architecture] | sort | join(", ")')
$ echo $image_archs
amd64, arm64
# assert image_archs == amd64, arm64

@rubenruizdegauna rubenruizdegauna dismissed stale reviews from swiatekm and ycombinator via f6432ab December 3, 2025 16:26
@rubenruizdegauna rubenruizdegauna marked this pull request as draft December 3, 2025 17:22
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

History

cc @rubenruizdegauna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants